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Project  Summary 


Scope:  The  goal  of  this  research  was  to  develop  a  comprehensive,  computationally  tractable  framework 
for  addressing  a  broad  class  of  problems  that  entail  extracting  information  very  sparsely  encoded  in  high 
volume  data  streams.  At  its  core  was  a  unified  vision,  centered  on  the  use  of  dynamical  models  as  informa¬ 
tion  encapsulators,  and  blending  elements  from  dynamical  systems  theory,  semi-algebraic  geometry,  sparse 
signal  recovery  and  convex  optimization.  It  included  both  theory  development  in  an  emerging  new  field 
-compressive  information  extraction-  and  an  investigation  of  implementation  issues. 

Relevance  to  the  USAF  mission:  As  emphasized  in  the  Technology  Horizons  report,  flexible,  provably  cor¬ 
rect  autonomy  is  a  key  enabler  for  maintaining  the  superiority  and  expanding  the  capabilities  of  the  USAF 
in  the  next  two  decades.  Autonomous  systems  endowed  with  analysis  and  decision  capabilities  can  collect 
data,  assess  intention,  and  if  necessary,  take  action,  while  at  the  same  time  substantially  reducing  the  re¬ 
quired  manpower  and  cost,  vis-a-vis  existing  unmanned  vehicles.  Arguably,  a  major  road-block  to  realizing 
this  vision  stems  from  the  curse  of  dimensionality.  Simply  put,  existing  techniques  are  ill-equipped  to  deal 
with  the  overwhelming  volume  of  data  that  needs  to  be  analyzed  in  real  time.  This  is  precisely  the  challenge 
addressed  by  this  research:  development  of  a  computationally  tractable  framework  for  robustly  extracting 
and  processing  actionable  information  sparsely  encoded  in  very  large  data  sets.  The  long  term  vision  was 
to  lay  the  foundations  for  designing  systems  endowed  with  provably  correct  flexible  autonomy,  capable  of 
making  decisions  in-situ,  with  minimal  human  intervention. 

Contributions  to  Basic  Science:  This  research  effort  took  the  first  steps  towards  developing  a  new  frame¬ 
work  -compressive  information  extraction-  that  allows  for  robustly  extracting  and  processing  information 
sparsely  encoded  in  very  large  data  sets.  At  its  core  is  a  new  approach  that  exploits  an  hitherto  unexplored 
connection  between  information  extraction  and  nonlinear  identification.  It  advanced  the  state  of  the  art  in 
control  theory  by  developing  a  tractable  framework  for  robust  identification/model  (invalidation)  of  switched 
systems,  a  key  component  of  a  comprehensive  control  framework  for  hybrid  systems.  In  addition,  this  re¬ 
search  generalized  to  a  dynamic  setting  the  existing  compressive  sensing  framework,  thus  substantially  ex¬ 
tending  its  domain  of  application.  Further,  it  unveiled  deep  connections  between  the  problems  addressed  and 
those  arising  in  other  branches  of  engineering  and  applied  mathematics.  Examples  include  the  connection 
between  nonlinear  dimensionality  reduction  methods  and  manifold  discovery  (both  hallmarks  of  machine 
learning)  and  nonlinear  identification,  and  between  rank  minimization  and  dynamic  data  interpolation. 

Benefits  to  the  General  Public:  In  addition  to  directly  supporting  the  USAF  mission,  the  results  obtained 
in  this  research  effort  have  the  potential  to  significantly  benefit  society.  Systems  endowed  with  activity 
analysis  capabilities  can  prevent  crime,  allow  elderly  people  to  continue  living  independently,  give  early 
warning  of  serious  medical  conditions,  for  instance  by  detecting  minute  gait  alterations  preceding  a  stroke, 
inspect  aging  civil  infrastructures,  and  monitor  and  even  coordinate  responses  to  environmental  threats  to 
minimize  their  effect.  Initial  steps  have  been  taken  to  transition  the  technology  developed  under  this  grant 
to  TSA  in  order  to  enhance  security  at  US  airports.  A  prototype  system  has  been  installed  at  the  Cleveland 
airport,  where  it  successfully  detected  security  breaches  unnoticed  by  humans. 
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1  Motivation 


The  goal  of  this  research  was  to  develop  a  comprehensive,  computationally  tractable  framework  for  address¬ 
ing  a  broad  class  of  problems  that  entail  extracting  information  very  sparsely  encoded  in  high  volume  data 
streams.  It  was  based  on  a  unified  vision,  centered  on  the  use  of  dynamical  models  as  information  encapsula¬ 
tes,  that  emphasized  robustness  and  computational  complexity  issues.  It  included  both  theory  development 
in  an  emerging  new  field  -compressive  information  extraction  [7]-  and  an  investigation  of  implementation 
issues. 


1.1  Transformative  Impact  and  Relevance  to  the  USAF  Mission 


As  emphasized  in  the  Technology  Horizons  report,  flexible,  provably  correct  autonomy  is  a  key  enabler 
for  maintaining  the  superiority  and  expanding  the  capabilities  of  the  USAF  in  the  next  two  decades:  Au¬ 
tonomous  systems  endowed  with  analysis  and  decision  capabilities  can  collect  data,  assess  intention,  and 
if  necessary,  take  action,  while  at  the  same  time  substantially  reducing  the  required  manpower  and  cost, 
vis-a-vis  existing  unmanned  vehicles.  Arguably,  a  major  road-block  to  realizing  this  vision  stems  from  the 
curse  of  dimensionality,  illustrated  in  Figure  1.  Simply  put,  existing  techniques  are  ill-equipped  for  analyz¬ 
ing  the  “data  avalanche”  generated  by  the  sensors,  within  the  constraints  imposed  by  the  need  for  robust, 
real  time  operation  in  dynamic,  partially  stochastic  scenarios.  This  was  precisely  the  issue  addressed  in  this 
project,  by  exploiting  recent  advances  in  robust  dynamical  systems,  sparse  signal  recovery,  semi-algebraic 
geometry  and  optimization.  The  long  term  vision  that  motivated  this  project  was  to  lay  the  foundations  for 
designing  systems  endowed  with  provably  correct  flexible  autonomy,  capable  of  making  decisions  in-situ, 
without  human  intervention,  while  passing  on  to  the  next  decision  level  only  mission-relevant  situational 
abstractions. 


(a)  (b)  (c)  (d) 

Figure  1:  Examples  of  actionable  information  sparsely  encoded  in  very  large  data  streams,  (a)  Target  tracking  in 
an  urban  canyon;  (b)  and  (c)  sample  frames  showing  contextually  abnormal  events:  onset  of  a  tunnel  fire  and  a 
person  entering  through  an  exit;  (d)  Tracking  multiple  targets.  In  all  cases  decisions  must  be  taken  based  on  events 
discernible  only  in  a  small  fraction  (less  than  0(1O-6))  of  a  very  large  data  record:  the  video  sequences  in  (a)-(d)  add 
up  to  megabytes,  yet  the  useful  information  (a  change  of  behavior  of  a  single  target),  is  contained  in  just  a  few  frames. 


The  main  idea  that  drove  this  research  was  to  recast  the  problem  of  sparse  information  extraction  into 
a  hybrid  systems  identification/model  (in)validation  form.  Briefly,  in  this  approach,  the  observed  data  is 
treated  as  the  output  of  an  underlying  switched  dynamical  system,  typically  represented  by  a  difference 
inclusion,  with  jumps  indicating  the  occurrence  of  events.  The  key  observation  is  the  fact  that  higher  degrees 
of  spatio-temporal  correlations  in  the  data  lead  to  lower  complexity  joint  models,  allowing  for  reformulating 
the  problem  of  information  extraction  into  a  dynamic  sparsification  form,  which  in  turn  can  be  reduced  to  a 
convex  semidefinite  optimization  problem. 


A  conceptual  diagram  illustrating  these  ideas  is  shown  in  Figure  2.  Notably,  merely  postulating  the 
existence  of  a  dynamically  sparse  underlying  model  led  to  efficient,  scalable  algorithms  for  information 
extraction.  For  instance,  in  this  context,  data  can  be  segmented  by  detecting  changes  in  suitable  model 
invariants  (such  as  complexity),  a  process  that  can  be  reduced  to  minimizing  the  rank  of  a  matrix  directly 
constructed  from  the  data.  Similarly,  interpolating  missing  data  and  determining  whether  two  data  streams 
correspond  to  time  traces  of  the  phenomenon  (for  instance  activity)  reduces  to  a  tractable  semi-definite 
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Sparse  Signal  Recovery: 

Strong  prior 

Signal  has  a  sparse  representation:  f{t)  — 
with  only  a  few  C{  ^  0. 

Signal  recovery 

Sparsify  the  coefficients: 

Sparse  Information  Recovery: 

Strong  prior 

Actionable  information  is  generated  by  a  low  com¬ 
plexity  dynamical  system 

Information  recovery 

Sparsify  the  dynamics:  minyrankH(y)  with 

min||[ci,...,cn]||0 

subject  to  f(t)  =  y(t). 

H(y)  = 

~y{ i)  y{ 2)  •• 

y( 2)  y(3) 

y(n) 

.  y(m  + 1) 

_y(m)  y(m  +1)  . . 

Um+n 

Relax  to  Linear  Programming 

Relax  to  Semidefinite  Programming 

min||[ci,...,cn]||i 

minTraceX(y)  subject  to  L(y)  >:  0 

subject  to  fit)  =  y(t). 

Figure  2:  Sparse  dynamical  information  recovery  versus  sparse  signal  recovery. 


optimization,  even  in  cases  where  the  data  has  no  time  overlap. 


2  Description  of  the  Research  Performed  and  Summary  of  the  Results 

In  this  section  we  give  a  brief  summary  of  the  research  performed  under  this  grant  and  our  findings.  Details 
can  be  found  in  the  papers  listed  in  the  publications  section,  which  can  be  downloaded  from  the  Robust 
Systems  Lab  website:  http://robustystems.coe.neu.edu. 


2.1  Basic  Science. 

In  principle,  embedding  information  extraction  problems  in  the  conceptual  world  of  systems  identifica¬ 
tion  makes  available  a  rich,  extremely  powerful  resource  base,  leading  to  computationally  tractable,  robust 
solutions.  However,  successful  application  of  the  ideas  outlined  above  hinged  upon  the  development  of 
computationally  tractable  solutions  to  the  following  problems,  open  at  the  time  that  the  project  was  started: 

2.1.1  Robust  identification  of  hybrid  systems:  As  outlined  before,  the  main  idea  that  drove  this  research 
was  to  treat  the  observed  data  as  the  output  of  an  underlying  switched  dynamical  system,  with  events  in¬ 
dicated  by  changes  in  invariants  associated  with  each  subsystem.  In  the  initial  phase  of  this  research,  we 
assumed  that  the  data  record  was  generated  by  a  piecewise  affine  model  of  the  form1 


/  (p<7(t)>  =  0 


(1) 


where  /  is  an  affine  function,  the  parameter  vector  p  (t)  takes  values  from  a  finite  set  indexed  by  piecewise 
!Note  that  this  can  be  assumed  without  loss  of  generality,  since  piece  wise  affine  models  are  universal  approximators. 
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constant  function  a{t)  and  where  77 f(t)  represents  bounded  noise.  In  this  context,  the  information  is  encap¬ 
sulated  in  the  parameter  vector  p.  For  instance,  events  are  indicated  by  changes  in  p  (£)  (an  identification 
problem).  Similarly,  two  time  series  can  be  considered  to  have  been  generated  by  the  same  process  if  they 
can  be  explained  by  the  same  p  (e.g.  a  model  (in)validation  problem).  While  both,  identification  and  model 
(in)validation  of  switched  affine  systems  are  known  to  be  NP-hard  problems,  as  part  of  this  research  we  have 
developed  tractable  relaxations,  with  optimality  certificates,  for  two  practically  relevant  cases: 

Identification  with  minimum  number  of  switches:  This  scenario  arises  for  instance  in  fault  detection,  where 
the  goal  is  to  minimize  the  number  of  false  alarms,  and  in  segmentation  problems  in  image  processing 
and  computer  vision,  where  it  is  often  desirable  to  maximize  the  size  of  regions  (roughly  equivalent  to 
minimizing  the  number  of  boundaries).  Formally,  this  problem  can  be  stated  as:  Given  input/output  data 
{ut,yt}t 0  over  the  interval  [to;T],  and  a  priori  information  consisting  of  a  convex  set  membership  noise 
description  J\f  and  bounds  nu  >  nc  and  ny  >  na  on  the  order  of  the  regressors,  find  a  switched  affine  model 
of  the  form: 

na  nc 

yt  =  ^2  aMt)y(t-i )  +  X  +  vt  (2) 

2=1  2=1 

where  u ,  y  and  77  denote  the  input,  output  and  noise,  respectively,  that  explains  the  experimental  data  with  the 
minimum  number  of  switches.  The  main  result  of  this  portion  of  the  research  [2]  showed  that,  by  defining 
the  sequence  of  first  order  differences  St  =  pt  ~  P(*+i),  identification  with  minimum  number  of  switches 
can  be  reduced  to  the  following  sparsification  problem: 

minPt  llPt  -P(t+i)Ho  (3) 

subject  to  yt  —  rf  pt  G  J\f  Vi 

Notably,  as  we  showed  in  [2],  when  the  noise  is  characterized  in  terms  of  its  norm,  that  is  A f  = 
{y :  MU  <  e},  then  an  exact  solution  can  be  found  by  solving  a  sequence  of  Linear  Programming  prob¬ 
lems.  This  is  one  of  the  very  few  sparsification  problems  where  exact  recovery  is  guaranteed,  without  the 
need  for  additional  conditions  on  the  data,  such  as  decoherence. 


Identification  with  bounded  number  of  subsystems.  In  this  case,  the  problem  can  be  formally  stated  as: 
Given  input/output  data  over  the  interval  [to;  T],  a  bound  on  the  I ^  norm  of  the  noise  (i.e.  \\rj\\oo  <  e)  find 
a  switched  ARX  model  of  the  form  (2),  with  no  more  than  s  subsystems,  that  interpolates  the  experimental 
data.  Although  in  principle  this  problem  is  NP-hard,  in  the  noise  free  case  (i.e.  ry  =  0  Vt),  it  can  be 
reduced  to  finding  the  null  space  of  a  suitable  constructed  matrix,  followed  by  polynomial  differentiation. 
The  starting  point  to  accomplish  this  is  to  rewrite  (2)  as 

b(at)Trt  =  0  (4) 

where  rt  =  [-yt,yt-i,  ■ . .  ,yt-na,ut-i,  ■  ■  ■  ,ut-nc]T  and  b(crt)  =  [l,  ai(at), . . . ,  ana(at),  ci(crt), . .  .]T, 
denote  the  regressor  and  (unknown)  coefficients  vectors  at  time  t ,  respectively.  The  idea  behind  the  Gener¬ 
alized  Principal  Component  Analysis  (GPCA)  method  is  to  decouple  the  identification  of  model  parameters 
from  the  identification  of  the  switching  sequence  by  noting  that  (4)  holds  for  some  at  if  and  only  if 

Ps(j)  =  nf=1(bfrt)  =  cTsvs(rt)  =  0  (5) 


holds  for  all  t  independent  of  which  of  the  s  submodels  is  active  at  time  £,  where  G  Rna+Uc+ 1  and 
z/s(.),  denote  the  parameter  vector  corresponding  to  the  ith  submodel  and  the  Veronese  map  of  degree  s , 
respectively.  Collecting  all  data  into  a  matrix  form  leads  to: 

Ws(rto)T] 


Vscs 


Vsi^r)1 


(6) 


3 

DISTRIBUTION  A:  Distribution  approved  for  public  release 


Hence,  one  can  solve  for  a  vector  cs  in  the  null  space  of  Vs  to  find  the  coefficients  of  the  multivariate 
polynomial  ps( r).  Unfortunately,  this  approach  breaks  down  in  the  presence  of  noise,  since  (5)  no  longer 
holds.  Rather,  we  have  the  following  (noisy)  equivalent 

s 

Ps(ft )  =  f[(b?U  =  cTsus(?t)  =  0  (7) 

1=1 


where  rt  =  [—yt  +  Pt,  yt- l,  •  •  • ,  ut- i, . . . ,  ut-nc]T ,  and  its  associated  “noisy”  data  matrix  Vs(r,  rj)  = 
Vs(r).  The  main  difficulty  here  is  that  finding  the  coefficients  of  the  polynomial  p(r)  requires  now  finding 
both  an  admissible  noise  sequence  rj°  and  a  vector  c°  such  that 

V5(f°)c°  =  0  (8) 

Since  Vs(r)  is  now  a  matrix  polynomial  function  of  the  unknown  noise  sequence  r/t,  this  is  a  computa¬ 
tionally  very  challenging  problem.  Nevertheless,  as  we  showed  in  [32],  the  use  of  polynomial  optimization 
allows  for  transforming  (8)  to  a  rank  minimization  problem  of  the  form 

Va(rt,mW)c°  =  0 
subject  to  M(m)  ^  0  and  L(m)  ^  0 

where  all  the  matrices  involved  are  affine  in  the  optimization  variables  m.  At  this  point,  a  tractable  convex 
relaxation  can  be  obtained  by  using  the  nuclear  norm  as  a  surrogate  for  rank,  leading  to  a  convex  semi- 
definite  program  that  can  be  solved  using  widely  available  tools  (see  [24,  32]  for  details). 

2.1.2  Extensions  to  missing  data.  In  most  practical  scenarios  only  partial  data  is  available,  due  for  instance 
to  occlusion  or  limited  sensing/transmitting  capabilities.  In  these  situations,  it  is  of  interest  to  estimate 
the  missing  data,  for  instance  in  order  to  perform  data  association  (e.g.  stitch  tracklets),  or  to  uncover 
correlations  mediated  by  the  missing  elements.  We  have  shown  that  this  interpolation  can  be  reduced  to 
a  rank  minimization  problem,  which  in  turn  (due  to  its  Hankel  structure)  can  be  efficiently  solved  using 
convex  relaxations.  These  theoretical  results  enabled  the  development  of  a  new  class  of  algorithms,  based 
upon  polynomial  optimization,  capable  of  handling  both  time  and  frequency  domain  constraints,  as  well  as 
constraints  on  the  order  of  the  interpolant  [5]. 

2.1.3  Robust  Identification  of  Hammerstein/Wiener  Systems.  In  the  context  of  information  extraction, 

this  problem  arises  naturally  as  a  way  of  handling  the  extreme  high  volume  of  data  involved.  Note  that 
this  high  volume  is  counterbalanced  by  a  high  degree  of  spatio-temporal  correlations:  for  instance,  pixels 
in  a  video  sequence  do  not  evolve  independently.  This  feature  can  be  exploited  to  substantially  reduce 
the  dimensionality  of  the  problem  by  embedding  the  raw  data  in  low  dimensional  manifolds.  Since  the 
projections  to/from  these  manifolds  can  be  modeled  as  memoryless  (possibly,  time  varying)  nonlinearities, 
this  approach  leads,  locally,  to  the  Hammerstein/Wiener  system  identification  problem  illustrated  in  Fig.  3. 
Motivated  by  polynomial  kernel  embeddings,  in  order  to  solve  the  problem,  the  given  temporal  data  {yt}, 
was  embedded,  via  a  nonlinear  projection  £*  =  H(yt)  in  manifolds  where  its  evolution  could  be  (locally) 
explained  by  a  linear  model  of  the  form:  =  Y^i=i  ai£k-i  +  Pk,  where  pk  accounted  for  approximation 

error.  Next,  to  each  embedded  time  series,  we  associated  a  Hankel  matrix  of  the  form: 
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Figure  3:  Left:  Manifold  embedding  as  a  nonlinear  identification:  Here  IF(.)  and  no(.)  are  memory  less  nonlineari¬ 
ties,  (u  G  RUu ,  y  G  RUy)  and  (e  G  Rn& ,  £  G  ),  with  nu  ne  and  ny  n %  represent  the  (high  dimensional)  raw 
data  and  its  projection  on  the  low  dimensional  manifold,  respectively,  and  the  piecewise  linear  dynamics  G(.)  governs 
the  evolution  of  data  on  the  manifold.  Center:  3  dimensional  manifold  obtained  applying  our  approach  to  the  KTH 
boxing  sequence  shown  in  the  right.  Note  that  the  outlier  has  minimal  influence  on  the  manifold  structure. 


Since  the  the  vector  w  =  {ai, . . . ,  aUa}  satisfies  H^w  =  0,  it  follows  that  the  dynamic  data  is  completely 
characterized  by  the  null  space  of  its  associated  Hankel  matrix.  Inspired  by  maximum  margin  classification 
algorithms  and  the  result  above,  we  developed  a  maximum  margin  dynamics-based  classification  algorithm 
that  worked  with  the  null  spaces  of  Hankel  matrices  [15].  Given  two  sets  of  training  data  {yf  }  and  { y 
corresponding  to  nominal  and  anomalous  scenarios,  we  jointly  sought  for  embeddings  ^  and  a  vector 
w  such  that  minimizes  7  subject  to  H-fff+wll!  <  7  and  >1+7-  Intuitively,  we  sought  a  vector 

w  such  that  (i)  it  approximately  lied  in  the  null  space  of  the  Hankel  matrices  of  all  the  positive  examples 
dynamic  sequences,  and  (ii)  it  maximized  the  margin  between  the  residue  ||i?£+Hl2  for  the  nominal  and 
anomalous  sequences.  Defining  the  Kernel  (or  Gram)  matrix  by  its  submatrices 


1  '  '  '  €j€j+c 

Cj+iCj  Cj+iCj+i  ■  ■  ■  Cj+iCj+c 

_Cj+c£j  Cj+cCj+i  ■  ■  ■  Cj+cCj+c_ 


and  noting  that  Gi  =  Hf  Hi  =  allowed  to  reduce  the  problem  outlined  above  to  (see  [15]  for 

details): 


min  k,w^  2  I  Ml!  +  ^7  ‘ 

subject  to:  wT G{W  <  7 


-  ATrace(iT) 
,  VG*  G  G+ 


wT GiW  H-  7  —  1-  5  VG^  G  G— 


■>T— c+l 


K„ 


ho 


si  _  1  —c 

~  Z^j= 1 

k  y  0,  7  >  0 

(1  -  e) 1 1 yi  -  yj ||2  <  ku  +  kjj  -  2 hj  <  (1  +  e)|| Vi  ~  Vj\? 


(10) 


where  the  last  constraint  approximately  enforced  preservation  of  the  local  spatial  geometry  and  where  the 
additional  term  — ATrace(iC)  in  the  objective  sought  to  favor  lower  dimensional  embeddings.  The  semi- 
algebraic  problem  was  solved  by  using  recent  results  in  sparse  polynomial  optimization,  that  exploited  its 
inherent  sparsity  to  substantially  reduce  the  computational  burden  [15]. 


2.1.4  Model  (In) Validation:  Consider  now  the  problem  of  establishing  whether  a  noisy  input/ouput  se¬ 
quence  could  have  been  generated  by  a  given  model  of  the  form  (2),  possibly  subject  to  model  uncertainty. 
Classically,  model  (in)validation  has  been  used  as  an  intermediate  step  following  identification  and  prior  to 
use  the  identified  models  for  control  synthesis.  Interestingly,  as  we  established  in  the  course  of  this  research, 
the  same  ideas  can  be  used  in  the  context  of  information  extraction  to  identify  contextually  abnormal  se¬ 
quences  (see  section  2.2.3).  Formally,  the  problem  of  interest  can  be  stated  as  establishing  whether  a  noisy 
input/ouput  sequence  could  have  been  generated  by  a  given  model  of  the  form: 

Vt  =  Y%Li<H((Tt)yt-i  +  E"=1 

yt  =  Vt  +  Vt,  °t  e  {l,...,s},  Halloo  <  e 
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where  yt  denotes  the  measured  output  corrupted  by  the  noise  r]t.  As  in  the  identification  case,  this  problem 
is  known  to  be  generically  NP-hard,  due  to  the  presence  of  noise  and  the  fact  that  the  mode  variable  at 
is  not  directly  measurable.  Cases  where  at  takes  only  a  small  number  of  discrete  values  (for  instance  a 
system  switching  between  two  known  modes),  can  be  handled  by  simply  considering  all  possible  switching 
sequences.  Clearly,  due  to  its  combinatorial  nature,  this  approach  becomes  infeasible  for  cases  involving 
relatively  small  number  of  subsystems.  On  the  other  hand,  this  combinatorial  complexity  can  be  avoided  by 
appealing  to  semi-algebraic  geometry  tools.  To  this  effect,  begin  by  noting  that,  (11)  holds  if  and  only  if: 

s 

Pr{Vt:t-nc )  =  [9t,i(Vt:t—nc)]  =  0  (12) 

i=l 


where: 


9t,i{j1t:t—nc )  —  Vt—l)  +  •  •  •  +  ana(i)(yt—na  Vt—na)  Oft  Vt)  +  \  (13) 

+  .  .  .  +  Cnc{i)Ut—nc 

Similarly,  the  norm  constraint  on  the  noise  sequence  rjt  is  equivalent  to  the  polynomial  constraint  ht(rjt)  = 
e2  —  rfi  >  0.  Hence,  there  exists  noise  and  switching  sequences  such  that  (12)  holds  if  and  only  if  the 
semi-algebraic  set 

T{rj)  =  {r 7  |  ft(vt)  >  0  V  t  G  [tOJT]  and  pt(vt:t-na)  =  0  Vi  G  [n0,T]}  .  (14) 


is  not  empty.  Thus,  an  (in)validation  certificate  can  be  obtained  by  considering  the  following  optimization 
problem: 


O*  =  min  vYZ=naPt(Vt:t-na) 

s.t.  hint)  >  o  vt  g  [o,  t] 


Note  that  o*  >  0  Tf(rj)  =  0.  While  computing  o*  requires  solving  a  computationally  challenging 

polynomial  optimization  problem,  a  convergent  sequence  of  lower  bounds  can  be  obtained  using  recent 
results  from  semi-algebraic  optimization,  leading  to  a  sequence  of  problems  of  the  form: 

d*N  =  minm  Et=na 

s.t.  6 

MN(mt_na:t )  Vt  G  [n0,  T]  3 

LN(ftmt-na:t )  ^0  Vt  G  [na  +  1  ,T] 


where  lt  is  a  linear  functional  of  the  optimization  variables  m  and  where  and  are  matrices  affine  in 
these  variables.  Hence,  these  problems  can  be  efficiently  solved  by  using  commonly  available  semi-definite 
optimization  solvers.  It  is  worth  emphasizing  that  this  reformulation  allows  for  exploiting  the  inherently 
sparse  structure  of  the  problem,  resulting  in  substantial  computational  complexity  reduction  [11,28]. 

2.1.5  Robust  estimation  under  £°°  bounded  disturbances.  Traditional  noise  models  often  do  not  capture 
key  features  of  the  problems  of  interest  here.  As  a  simple  example,  noise  in  images  should  be  bounded. 
While  in  principle  this  feature  can  be  captured  using  truncated  distributions,  the  resulting  problems  are 
computationally  hard.  To  circumvent  this  difficulty  we  developed  a  new  framework  for  robust  estimation 
in  the  presence  of  unknown-but-bounded  noise.  Using  a  concept  similar  to  superstability  led  to  robust 
filters  that  can  be  synthesized  by  simply  solving  a  linear  programming  problem  [1].  A  salient  feature  of  this 
framework  is  that  it  explicitly  allows  for  trading  off  filter  complexity  against  worst-case  estimation  error.  We 
have  also  extended  this  framework  to  the  more  challenging  case  where  the  mode  variable  is  not  accessible 
to  the  filter  and  shown  that  the  resulting  problem  can  be  recast  into  a  (polynomial)  sparsification  form  and 
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solved  using  results  from  semi-algebraic  geometry  [13].  In  addition,  we  explored  a  new  information-based 
complexity  framework  that  combines  the  properties  of  traditional  worst  case  and  probabilistic  estimation 
approaches,  leading  to  a  substantial  reduction  in  the  conservatism  of  the  former,  while  retaining  its  ability 
to  provide  bounds  on  worst  case  errors  [8,9,27].  Finally,  we  have  developed  computationally  tractable 
algorithms  for  synthesizing  optimal  filters  subject  to  sparsity  constraint  on  the  information  flow,  and  for 
optimal  sensor  selection.  The  main  result  of  [19]  showed  that,  surprisingly,  the  first  problem  is  convex, 
while  in  [16]  we  showed  that  while  the  second  problem  in  non-convex,  tractable  convex  relaxations  with 
optimality  guarantees  can  be  obtained  using  tools  from  semi-algebraic  geometry. 


2.2  Application:  Detecting  Contextually  Abnormal  Events: 


We  applied  the  theoretical  framework  described  in  section  2. 1  to  a  problem  at  the  core  of  flexible  autonomy: 
detecting  contextually  abnormal  activities.  Solving  this  problem  in  realistic,  potentially  adversarial  environ¬ 
ments  required  the  ability  to  (i)  perform  persistent  tracking,  (ii)  detect  significant  events,  and  (iii)  recognize 
activities  from  noisy,  fragmented  data  records.  As  briefly  outlined  below,  the  framework  developed  in  this 
research  indeed  leads  to  robust,  computationally  tractable  solutions  to  these  problems. 


Figure  4:  Using  a  Hammers tein/Wiener  system  to  achieve  sustained  tracking  in  the  presence  of  appearance  changes. 
The  left  and  right  portions  of  each  frame  show  the  actual  and  predicted  target  appearance,  with  their  correlation 
displayed  in  the  top  left  corner. 


Figure  5:  Using  dynamics  to  track  targets  with  similar  appearance 


2.2.1  Tracking  via  Robust  Nonlinear  Operator  Embeddings.  The  ability  to  persistently  track  and  disam¬ 
biguate  is  a  key  enabler  for  flexible  autonomy.  However,  this  process  is  far  from  trivial  in  urban  environ¬ 
ments  due  to  occlusion  and  target  appearance  changes,  compounded  by  the  (potential)  existence  of  multiple 
targets  with  similar  appearance.  In  the  context  of  this  research,  we  overcame  these  barriers  by  using  our 
identification  framework  to  map  the  data  (in  this  case  image  features)  to  points  on  low  dimensional  mani¬ 
folds  where  dynamics  are  locally  linear  time  invariant,  effectively  decoupling  appearance  (embedded  in  the 
manifold  structure)  from  intrinsic  dynamics.  The  resulting  models  were  used  to  reconstruct  missing  data, 
predict  future  target  positions  and  disambiguate  targets.  The  potential  of  this  approach  is  illustrated  in  Fig¬ 
ure  4,  where  it  was  used  to  achieve  sustained  tracking  in  the  presence  of  extreme  appearance  changes,  due 
to  a  target  U-turn.  In  addition,  exploiting  the  dynamical  information  allowed  for  sustained  tracking  under 
substantial  occlusion  [6],  and  for  disambiguating  multiple  targets  with  similar  appearances,  such  as  those 
shown  in  Fig.  5  [18]. 
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2.2.2  Dynamic  data  segmentation  and  event  de¬ 
tection.  These  problems  can  be  embedded  in  the 
proposed  identification  framework  by  simply  not¬ 
ing  that  events  correspond  to  mode  changes  in  the 
underlying  dynamical  system  and  thus  can  be  de¬ 
tected  by  monitoring  changes  in  invariants  associ¬ 
ated  with  individual  models.  The  simplest  such  in¬ 
variant  is  model  order,  since,  intuitively,  models  as¬ 
sociated  with  homogeneous  data,  e.g.  a  single  ac¬ 
tivity,  have  far  lower  complexity  than  those  jointly 
explaining  multiple  datasets.  Boundaries  are  thus 
characterized  by  an  increase  in  model  complexity, 
and  can  be  detected  by  performing  a  sequence  of  SVDs  of  empirical  Hankel  matrices.  An  example  of  the 
potential  of  this  approach  to  detect  activity  changes  from  real  video  feeds  is  shown  in  Fig.  6. 

2.2.3  Activity  Recognition  and  Anomaly  Detection:  The 

vision  driving  the  application  domains  considered  in  this 
proposal  is  that  of  autonomous  systems  endowed  with 
the  capability  to  recognize  anomalous  behavior.  We  pro¬ 
pose  to  embed  this  problem  into  our  identification/model 
(in) validation  framework  as  follows.  The  starting  point  is  to 
consider  activities  as  second-order  stationary  stochastic  pro¬ 
cesses.  Thus,  each  activity  can  be  considered  as  the  output 
of  a  time-invariant  dynamical  system.  Further,  by  projecting 
the  raw  data  into  suitable  manifolds  allows  for  decoupling 
the  effect  of  “nuisance”  factors  (such  as  view-point  or  ap¬ 
pearance  changes)  from  the  intrinsic  dynamics  of  the  activity 
under  consideration.  Then,  given  a  sequence  of  frames  from 
a  single  unknown  activity,  recognition  can  be  accomplished 
by  interrogating  a  database  of  known  activity  models  to  es¬ 
tablish  whether  it  contains  an  element  (and  an  associated  un¬ 
certainty  description)  compatible  with  the  observed  data.  A 
difficulty  here  is  that  a  single  activity  can  consist  of  the  concatenation  of  several  sub-activities  of  various 
lengths.  For  instance  a  “normal”  activity  could  consist  of  walking  for  two  minutes,  standing  for  one,  and 
then  resuming  walking.  However,  this  is  precisely  the  situation  addressed  by  the  proposed  switched  model 
(in)validation  framework  described  in  Section  2.1.4.  Advantages  of  this  (in)validation  based  approach  over 
existing  ones  include  the  ability  to  fully  exploit  dynamic  information,  handle  data  streams  that  do  not  over¬ 
lap  in  time  and  directly  eliminate  the  effect  of  nuisance  factors.  These  advantages  are  illustrated  in  Fig.  7 
using  a  simple  example  involving  two  known  activities. 

2.2.4  Finding  Causal  Interactions  in  Video  Data.  In  many  scenarios,  seemingly  benign  individual  actions 
can  indeed  aggregate  to  potential  threats.  An  example  of  these  situations  are  flash  mobs.  Thus,  as  part  of 
this  research  we  applied  our  identification  framework  to  the  problem  of  detecting  causally  interacting  indi¬ 
viduals.  The  main  idea,  illustrated  in  Fig.  8  is  to  recast  the  problem  into  a  sparse  dynamical  graphical  model 
identification  form.  In  this  context,  each  node  corresponds  to  the  observed  motion  of  a  given  target,  and 
each  link  indicates  the  presence  of  a  causal  correlation.  As  we  showed  in  [17],  this  approach  led  to  a  block- 
sparsification  problem  that  can  be  efficiently  solved  using  a  modified  Group-Lasso  type  approach,  capable 
of  handling  missing  data  and  outliers  (due  for  instance  to  occlusion  and  mis-identified  correspondences). 


Figure  7:  Anomalous  behavior  detection  as  a 
switched  (in)validation  problem.  The  activity 
database  consists  of  models  of  two  activities, 
“walk”  and  “wait”.  The  top  sequence  (walk- 
wait-walk)  is  not  (in)validated  since  both  activ¬ 
ities  are  in  the  database.  The  bottom  sequence 
(walk-jump)  is  flagged  as  abnormal  since  it  can¬ 
not  be  generated  by  switching  amongst  models  in 
the  database. 


Figure  6:  .  Fast  event  detection.  The  jump  in  the  rank 
of  the  Hankel  matrix  indicates  a  change  in  dynamics  as  the 
suspect  of  a  2010  bombing  attack  in  New  York  City  stops 
to  remove  his  sweater. 
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Figure  8:  Finding  causal  interactions  as  a  graph  identification  problem.  Left:  Representation  of  this 
sequence  as  a  graph,  where  each  node  represents  the  time  series  associated  with  the  position  of  each  player 
and  the  links  are  vector  regressive  models.  Causal  interactions  exist  when  one  of  the  time  series  can  be 
explained  as  a  combination  of  past  values  of  the  others.  Right:  Application  of  these  ideas  to  the  problem  of 
finding  causally  interacting  players  in  a  basketball  game. 


Moreover,  this  approach  also  identified  time  instants  where  the  interactions  between  agents  changed,  thus 
providing  event  detection  capabilities.  Efficient  computational  methods  were  developed  by  combining  this 
idea  with  the  parsimonious  model  identification  framework  developed  in  [4,  20,29]. 
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2.5  Transitions 


The  theoretical  framework  developed  under  this  grant  was  used  to  develop  a  “contra-flow”  detector  to  sup¬ 
port  TSA  agents  by  alerting  them  to  potential  attempts  to  breach  secure  areas.  This  technology  was  tested  at 
the  Cleveland  Hopkins  International  Airport  for  over  a  year,  were  it  screened  on  average  50,000  passengers 
per  week.  This  technology  was  showcased  to  the  Hon.  Janet  Napolitano  (US  secretary  of  Homeland  Secu¬ 
rity,  November  2012),  Mr.  John  S.  Pistole  (TSA  administrator  and  former  FBI  Deputy  Director,  June  2013) 
and  the  Hon.  Theresa  May  (U.K.  Home  Secretary,  Sept.  2014).  It  was  also  covered  in  a  N.Y.  Times  article 
that  appeared  on  May  8,  2015. 


Figure  9:  Left:  security  breach  detection  technology  being  demonstrated  to  the  Hon.  J.  Napolitano,  Home¬ 
land  Security  Secretary.  Right:  technology  demo  for  Mr.  J.  Pistole,  TSA  Administrator  and  former  FBI 
Deputy  Director. 


Figure  10:  Security  breach  detection  technology  deployed  at  the  Cleveland  Hopkins  international  airport. 


2.6  Disclaimer 

The  views  and  conclusions  contained  in  this  report  are  those  of  the  authors  and  should  not  be  interpreted  as 
necessarily  representing  the  official  policies  or  endorsements,  either  expressed  or  implied,  of  AFOSR  or  the 
U.S.  Government. 
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